Skip to content

Comments

Add checkpoint discovery hints to agent prompts#894

Merged
jwbron merged 15 commits intomainfrom
egg/issue-887
Feb 23, 2026
Merged

Add checkpoint discovery hints to agent prompts#894
jwbron merged 15 commits intomainfrom
egg/issue-887

Conversation

@james-in-a-box
Copy link
Contributor

Improve checkpoint discoverability for agents

Agents rarely use egg-checkpoint because nothing in their prompts tells them to.
The CLI works, documentation exists, and a Claude Code rule is present — but agents
only discover checkpoints if they happen to read the right docs. This change surfaces
checkpoint commands directly in agent prompts and mode documentation so every agent
session automatically knows how to query prior work.

Changes:

  • Orchestrator prompts (orchestrator/routes/pipelines.py): Added checkpoint
    discovery hints to _build_role_context, _build_agent_prompt (tester, documenter,
    integrator roles), and _build_phase_scoped_prompt (revision cycles with failed
    session lookup).
  • Agent mode commands: Added checkpoint browsing sections to tester-mode.md,
    documenter-mode.md, integrator-mode.md, and coder-mode.md (revision cycle).
  • Rules: Updated checkpoint.md with a "When to Use" section mapping each role
    to its most useful checkpoint commands. Updated mission.md to list checkpoints
    as a context source and mention them in the gather-context workflow.
  • Documentation: Expanded docs/guides/checkpoint-access.md with documenter and
    coder-revision workflow examples, plus cost command for integrators.
  • Tests: Added test_checkpoint_discovery.py (460 lines) verifying all prompt
    hints are present, with a conftest.py fix for sys.modules mock isolation.

Closes #887

Issue: #887

Test plan:

  • Run pytest orchestrator/tests/test_checkpoint_discovery.py — all tests pass
  • Verify tester/documenter/integrator prompts contain egg-checkpoint commands
  • Verify revision-cycle prompt includes failed session lookup hint
  • Check agent mode .md files contain checkpoint sections

Authored-by: egg

egg-orchestrator added 14 commits February 23, 2026 07:16
Container ecb438f293a0195cad54e4f5fe6dc8c1371af55ce05af32dd1dfefbe82427861 exited with uncommitted changes.
This commit preserves the agent's work-in-progress.

Authored-by: egg
Container 302feb3d8b2a3c5204f2f8d40b381654e890ed79dd73f4a45e788f1bc5531258 exited with uncommitted changes.
This commit preserves the agent's work-in-progress.

Authored-by: egg
Add role-specific checkpoint discovery hints at three levels:

1. Orchestrator prompts (auto-injected into every agent session):
   - _build_role_context(): checkpoint pointer in For More Context section
   - _build_agent_prompt(): tester gets coder checkpoint list, documenter
     gets context --files, integrator gets context and cost commands
   - _build_phase_scoped_prompt(): revision checklist gets failed-session hint

2. Agent mode commands (supplementary reference):
   - tester-mode.md: Review Prior Work section
   - integrator-mode.md: Pipeline Overview section
   - documenter-mode.md: Find Changed Files section
   - coder-mode.md: Revision Cycle Context section

3. Mission and checkpoint rules (baseline awareness):
   - mission.md: Checkpoints row in context sources table, checkpoint
     hint in Gather context workflow step
   - checkpoint.md: When to Use section with role-specific guidance

All changes are additive text. No logic changes, no schema migrations.
All 1331 orchestrator tests pass.
Import the real docker module in orchestrator/tests/conftest.py before
test collection so that modules using sys.modules.setdefault("docker",
MagicMock()) don't shadow the real package. This prevents
docker_client.NotFound et al. from being bound to MagicMock objects
which aren't BaseException subclasses and break except clauses.
Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No agent-mode design concerns. The changes are well-aligned with the agent-mode design guidelines.

The PR adds short checkpoint discovery hints (1-3 lines each) to orchestrator prompts, agent mode commands, and rules. These are orientation, not constraint — they tell agents when to use an existing tool (egg-checkpoint) without pre-fetching data, requiring structured output, or micromanaging procedure. The hints use suggestive language ("Before writing tests, review...") rather than imperative mandates, and reference the existing checkpoint.md rule for details rather than duplicating content. Analysis roles are correctly excluded.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Add checkpoint discovery hints to agent prompts

Verdict: Approve (with minor suggestions)

Low-risk PR that adds text-only checkpoint discovery hints to orchestrator prompts, agent mode commands, rules, and documentation. No logic changes, no schema migrations. The approach is sound — addressing discoverability at the highest-leverage injection points (orchestrator prompts) rather than over-engineering with new tooling.

Issues Found

1. Command inconsistency: tester-mode.md omits --phase implement (non-blocking)

orchestrator/routes/pipelines.py:2485 uses:

egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement

sandbox/.claude/commands/tester-mode.md:92 uses:

egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder

The --phase implement flag is omitted from tester-mode.md. The pipelines.py version is more precise — it filters to the implement phase, which is what the tester actually needs. The tester-mode.md version would return checkpoints from all phases (including plan/refine), which is rarely useful. The code reviewer for this PR noted the same inconsistency. The checkpoint.md rule (line 9) also omits --phase implement for the tester use case.

Suggestion: Add --phase implement to both tester-mode.md and the tester entry in checkpoint.md to match pipelines.py.

2. conftest.py docker import fix — correct but worth a note (non-blocking)

The conftest.py change (orchestrator/tests/conftest.py:20-28) pre-imports the real docker module before test collection to prevent sys.modules.setdefault("docker", MagicMock()) calls from shadowing it. This is the correct fix for the test_docker_client.py failures caused by mock pollution from test_checkpoint_discovery.py.

The fix is sound and well-documented with a clear comment explaining the rationale. The try/except ImportError handles environments where docker isn't installed. This is a legitimate fix to a pre-existing fragility in the test suite's module mocking strategy.

3. Test coverage is solid (positive)

460 lines of tests in test_checkpoint_discovery.py covering:

  • Positive cases for all execution roles (tester, documenter, integrator)
  • Negative cases for analysis roles (architect, task_planner, risk_analyst)
  • Edge cases (None prompt, None issue_number, empty feedback, cycle 0)
  • Revision checklist gating on review_cycle > 0 and review_feedback
  • End-to-end tests verifying both general and role-specific hints
  • Ordering assertions (checkpoint hint appears after gap-finding section)

The test file follows the same docker mock pattern as test_pipeline_prompts.py.

No Blocking Issues

The PR is well-scoped and correctly implements the prompt-only approach (Option A). Specifically:

  • Orchestrator prompt changes (pipelines.py): Checkpoint hints are injected at the right locations — _build_role_context for the general pointer, _build_agent_prompt for role-specific commands, and _build_phase_scoped_prompt for revision cycle failed-session hints. Analysis roles are correctly excluded via the early return at line 1225.
  • Agent mode commands: Each role gets a contextually appropriate section. Coder-mode correctly scopes the hint to revision cycles only.
  • Rules updates: checkpoint.md gets a "When to Use" section mapping roles to commands. mission.md adds checkpoints to the context sources table and workflow.
  • Documentation: checkpoint-access.md adds documenter and coder-revision workflow examples. sdlc-pipeline.md lists checkpoints as a 4th context persistence mechanism.
  • .egg-state files: Pipeline state artifacts are properly updated for issue #887. These are expected tracking files.

The hint language is appropriately suggestive ("Before writing tests, review..." / "Find all changed files...") rather than imperative, which should minimize unnecessary checkpoint queries on first-cycle runs.

— Authored by egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Verification: PASS

All 14 tasks across 4 phases verified against the contract for issue #887.

Phase 1: Orchestrator prompt injection (5/5 tasks verified)

Task Acceptance Criteria Status
task-1-1 _build_role_context() includes egg-checkpoint context --pipeline for all execution roles Verified — line appended to "For More Context" section, analysis roles correctly excluded
task-1-2 Tester prompt includes egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement Verified — exact command present in tester section
task-1-3 Documenter prompt includes egg-checkpoint context --pipeline $EGG_PIPELINE_ID --files Verified — exact command present in documenter section
task-1-4 Integrator prompt includes egg-checkpoint context and egg-checkpoint cost commands Verified — both commands present in integrator section
task-1-5 Revision checklist includes egg-checkpoint list --issue $EGG_ISSUE_NUMBER --status failed when review_cycle > 0 Verified — added to revision checklist, correctly absent for cycle 0 and no-feedback cases

Phase 2: Agent mode command updates (4/4 tasks verified)

Task Acceptance Criteria Status
task-2-1 tester-mode.md contains ## Review Prior Work with egg-checkpoint list command Verified
task-2-2 integrator-mode.md contains ## Pipeline Overview with context and cost commands Verified
task-2-3 documenter-mode.md contains ## Find Changed Files with context --files command Verified
task-2-4 coder-mode.md contains ## Revision Cycle Context with --status failed command Verified

Phase 3: Mission rule and checkpoint rule updates (3/3 tasks verified)

Task Acceptance Criteria Status
task-3-1 mission.md context sources table includes Checkpoints row Verified — row added with egg-checkpoint CLI location
task-3-2 mission.md "Gather context" mentions egg-checkpoint for multi-agent pipelines Verified — appended to Gather context step
task-3-3 checkpoint.md has "When to Use" section for tester, documenter, integrator, coder (revision) Verified — all four roles listed with commands

Phase 4: Verification (2/2 tasks verified)

Task Acceptance Criteria Status
task-4-1 All existing tests pass Verified — 6683 passed, 2 pre-existing failures (test_docker_client.py, unrelated)
task-4-2 Checkpoint hints appear in prompt outputs for each role Verified — 29 new tests in test_checkpoint_discovery.py cover all roles

Additional Changes (outside contract scope but consistent)

  • docs/guides/checkpoint-access.md: Added documenter and coder-revision workflow examples, updated tester example to use --pipeline flag, added cost command to integrator example
  • docs/guides/sdlc-pipeline.md: Added checkpoints as 4th context persistence mechanism and checkpoint discovery hints bullet
  • sandbox/.claude/README.md: Added checkpoint.md to core rules listing
  • orchestrator/tests/conftest.py: Fixed docker_client test failures from sys.modules mock pollution (import-ordering fix)
  • orchestrator/tests/test_checkpoint_discovery.py: 460-line test file with 29 tests covering all prompt-building functions

Contract Notes

  • The contract's top-level acceptance_criteria array is empty — all criteria are embedded per-task. Could not use egg-contract verify-criterion due to this structure.
  • All task statuses in the contract remain "pending" (not updated by the coder/integrator), but the implementation is complete and verified.
  • No contract violations found.
  • No orphaned code outside contract scope.
  • All changes are additive text — no logic changes, no schema migrations.

Verdict: Approve

All 14 acceptance criteria are met. Implementation is clean, well-tested, and consistent across all injection points. No gaps or violations found.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

Addresses review feedback: the tester checkpoint list command in
tester-mode.md and checkpoint.md was missing --phase implement,
which would return checkpoints from all phases instead of just
the implement phase that testers actually need. Now matches the
authoritative version in orchestrator/routes/pipelines.py.
Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added --phase implement to both sandbox/.claude/commands/tester-mode.md and sandbox/.claude/rules/checkpoint.md tester entries to match the authoritative version in orchestrator/routes/pipelines.py:2485. The tester only needs implement-phase checkpoints, so filtering by phase avoids returning plan/refine noise.

Pushed in 6f5a4f2.

— Authored by egg

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the delta since b49d797. The only change is 6f5a4f2 which adds --phase implement to tester checkpoint commands in tester-mode.md and checkpoint.md, addressing the inconsistency flagged in the previous review. No agent-mode design concerns.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Verification: Re-review (commit 6f5a4f2)

Verdict: PASS — All contract tasks verified.

Note: The branch was force-pushed since the previous review at b49d7978 (commit no longer in history). Full re-verification was performed.

Phase 1: Orchestrator prompt injection ✓

Task Acceptance Criteria Status
task-1-1 _build_role_context() includes egg-checkpoint context --pipeline for all execution roles Verified — Added at L1289+, analysis roles excluded by early return
task-1-2 Tester prompt includes egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement Verified — Added in tester section of _build_agent_prompt()
task-1-3 Documenter prompt includes egg-checkpoint context --pipeline $EGG_PIPELINE_ID --files Verified — Added in documenter section
task-1-4 Integrator prompt includes egg-checkpoint context and egg-checkpoint cost commands Verified — Both commands added in integrator section
task-1-5 Revision checklist includes failed-session hint when review_cycle > 0 Verified — Added inside if review_feedback and review_cycle > 0: guard

Phase 2: Agent mode command updates ✓

Task Acceptance Criteria Status
task-2-1 tester-mode.md has ## Review Prior Work with egg-checkpoint list Verified
task-2-2 integrator-mode.md has ## Pipeline Overview with context + cost commands Verified
task-2-3 documenter-mode.md has ## Find Changed Files with context --files Verified
task-2-4 coder-mode.md has ## Revision Cycle Context with --status failed Verified

Phase 3: Mission rule and checkpoint rule updates ✓

Task Acceptance Criteria Status
task-3-1 mission.md context sources table includes Checkpoints row Verified
task-3-2 mission.md "Gather context" step mentions egg-checkpoint context Verified
task-3-3 checkpoint.md has "When to Use" with tester/documenter/integrator/coder guidance Verified — All four roles listed with role-specific commands

Phase 4: Verification ✓

Task Acceptance Criteria Status
task-4-1 All existing tests pass Verified — 6685 passed, 85 skipped. 2 pre-existing failures in test_docker_client.py (on main)
task-4-2 Checkpoint hints confirmed in prompts for all roles Verified — 29 new tests in test_checkpoint_discovery.py covering all roles and edge cases

Additional Changes (non-contract)

  • orchestrator/tests/conftest.py: Docker import fix to prevent sys.modules mock pollution — correctly addresses a pre-existing test ordering issue
  • docs/guides/checkpoint-access.md: Added documenter and coder-revision workflow examples, updated tester example to use --pipeline, added cost command to integrator example
  • docs/guides/sdlc-pipeline.md: Added checkpoints as 4th context persistence mechanism
  • sandbox/.claude/README.md: Added checkpoint.md to core rules listing

All additional changes are consistent with the contract's intent and do not violate any criteria.

Contract Integrity

  • No regressions on previously verified criteria (force-push, so full re-verification done)
  • No orphaned code outside contract scope
  • No contract violations found
  • All 14 task acceptance criteria met

— Authored by egg

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Add checkpoint discovery hints to agent prompts

Delta reviewed: b49d79788d..6f5a4f2 (feedback-addressing commit)

Previous Feedback: Addressed

The --phase implement inconsistency has been fixed correctly. The tester checkpoint command now includes --phase implement in all three locations:

  • sandbox/.claude/commands/tester-mode.md — fixed
  • sandbox/.claude/rules/checkpoint.md — fixed
  • docs/guides/checkpoint-access.md — already correct

All five injection points (pipelines.py, tester-mode.md, checkpoint.md, checkpoint-access.md, and the test assertions) now use the same command: egg-checkpoint list --pipeline $EGG_PIPELINE_ID --agent-type coder --phase implement.

Full Review Summary

Reviewed all 21 changed files across the PR. No blocking issues found.

Orchestrator prompt changes (pipelines.py): Five clean insertions across _build_role_context, _build_agent_prompt (tester/documenter/integrator), and _build_phase_scoped_prompt. Analysis roles are correctly excluded via the early return at line 1225. Checkpoint hints are placed at the right location in each section — after the relevant task context, not before. The revision checklist hint is properly gated behind review_cycle > 0 and review_feedback.

Agent mode commands (.md files): Each role gets a contextually appropriate section placed before the Quality Checklist. Commands match the authoritative versions in pipelines.py.

Rules and docs: checkpoint.md "When to Use" section, mission.md context sources table, and checkpoint-access.md workflow examples are all consistent with each other and with the orchestrator prompts.

Tests (test_checkpoint_discovery.py): 29 tests covering positive cases for all execution roles, negative cases for all analysis roles, edge cases (None prompt, None issue_number, empty feedback, cycle 0), ordering assertions, and end-to-end flows. Test structure follows the existing test_pipeline_prompts.py pattern.

conftest.py fix: Pre-importing docker before test collection prevents sys.modules.setdefault("docker", MagicMock()) from shadowing the real package. Correct fix with proper try/except ImportError guard. This addresses a real test isolation issue.

No security, correctness, robustness, or design issues identified. All changes are additive prompt text — no logic modifications, no schema changes, no breaking changes.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box
Copy link
Contributor Author

egg is addressing review feedback...

@james-in-a-box
Copy link
Contributor Author

egg feedback addressed. View run logs

7 previous review(s) hidden.

@jwbron jwbron merged commit 3dd7f51 into main Feb 23, 2026
39 of 40 checks passed
james-in-a-box bot pushed a commit that referenced this pull request Feb 23, 2026
Resolved conflicts in two globally-named .egg-state files that were
written by different pipelines (issue #871 vs #887) — the exact class
of conflict this PR is designed to prevent:

- .egg-state/agent-outputs/risk_analyst-output.json: kept PR branch
  version (issue #871 risk assessment)
- .egg-state/checks/implement-results.json: kept PR branch version
  (issue #871 check results)

Both files are ephemeral pipeline state artifacts. Main's versions
were from a different pipeline run (issue #887, PR #894).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve checkpoint discoverability for agents

1 participant